Linguistically quantified thresholding strategies for text categorization

نویسندگان

  • Slawomir Zadrozny
  • Janusz Kacprzyk
چکیده

A new thresholding strategy for a text categorization problem is proposed. It is based on Zadeh’s calculus of linguistically quantified propositions. The strategy may be also interpreted in terms of fuzzy integral.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetic Algorithm Based Text Categorization Using OLEX Method

The system describes new similarity-based genetic algorithm (GA) and thresholding Strategies (R&SCut variants). GA was designed to give appropriate weights to terms according to their semantic content and importance by using their co-occurrence information and the discriminating power values for similarity computation. After investigating the existing common thresholding strategies, design mult...

متن کامل

Text Categorization with a Small Number of Labeled Training Examples

This thesis describes the investigation and development of supervised and semisupervised learning approaches to similarity-based text categorization systems. It uses a small number of manually labeled examples for training and still maintains effectiveness. The purpose of text categorization is to automatically assign arbitrary raw documents to predefined categories based on their contents. Tex...

متن کامل

A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization

Two main research areas in statistical text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization. After investigating two similarity-based classifiers (k-NN and Rocchio) and three common thresholding techniques (RCut, PCut, and SCut), we describe...

متن کامل

KAN and RinSCut: Lazy Linear Classifier and Rank-in-Score Threshold in Similarity-Based Text Categorization

Two important research areas in statistical approaches for automated text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization systems. After researching common techniques in both areas, we describe a lazy linear classifier known as the keyword a...

متن کامل

Rigorous dimensionality reduction through linguistically motivated feature selection for text categorization

This paper introduces a new linguistically motivated feature selection technique for text categorization based on morphological analysis. It will be shown that compound parts that are constituents of many (different) noun compounds throughout a text are good and general indicators of this text’s content; they are more general in meaning than the compounds they are part of, but nevertheless have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003